This is an example notebook walking through the construction of the atlas
import numpy as np
import pandas as pd
import sklearn
import gc
import functions
import scipy
blood_atlas_colours = pd.read_csv('/Users/pwangel/Data/Metadata_dumps/imac_atlas_colours.tsv', sep='\t').set_index('Sample Source')
blood_atlas_colours = {key:value[0] for key, value in zip(blood_atlas_colours.index.values, blood_atlas_colours.values)}
Reading in data, including nadias annotations, excel spreadsheet with multiple tabs
data = pd.read_csv('/Users/pwangel/Downloads/myeloid_atlas_expression_v7.1.tsv', sep='\t', index_col=0)
annotations = pd.read_csv('/Users/pwangel/PlotlyWorkspace/combine_data/blood/outputs_for_front_end/iMac_annotations.tsv', sep='\t', index_col=0)
genes = pd.read_csv('/Users/pwangel/Downloads/myeloid_atlas_genes.tsv', sep='\t', index_col=0)
data = functions.transform_to_percentile(data)
Only need to compute gene variance fraction if not done already, in the above we have already read a previously calculated version into the gene dataframe
#genes = functions.calculate_platform_dependence(data, annotations)
#genes.to_csv('/Users/pwangel/Downloads/myeloid_atlas_genes.tsv', sep='\t')
pca = sklearn.decomposition.PCA(n_components=10, svd_solver='full')
pca.fit(functions.transform_to_percentile(data.loc[genes.Platform_VarFraction.values<=0.2]).transpose())
pca_coords = pca.transform(functions.transform_to_percentile(data.loc[genes.Platform_VarFraction.values<=0.2]).transpose())
Plot the pca
functions.plot_pca(pca_coords, annotations,pca, \
labels=['celltype', 'Platform_Category', 'Dataset'], colour_dict=blood_atlas_colours)
functions.plot_gene_platform_dependence_distribution(data, annotations, genes)
Make a graph of the threshold lowering process using the Kruskal Wallis H Test
functions.plot_KW_Htest(data, annotations, genes)